conditional independence testing
Toward Scalable and Valid Conditional Independence Testing with Spectral Representations
Frohlich, Alek, Kostic, Vladimir, Lounici, Karim, Perazzo, Daniel, Pontil, Massimiliano
Conditional independence (CI) is central to causal inference, feature selection, and graphical modeling, yet it is untestable in many settings without additional assumptions. Existing CI tests often rely on restrictive structural conditions, limiting their validity on real-world data. Kernel methods using the partial covariance operator offer a more principled approach but suffer from limited adaptivity, slow convergence, and poor scalability. In this work, we explore whether representation learning can help address these limitations. Specifically, we focus on representations derived from the singular value decomposition of the partial covariance operator and use them to construct a simple test statistic, reminiscent of the Hilbert-Schmidt Independence Criterion (HSIC). We also introduce a practical bi-level contrastive algorithm to learn these representations. Our theory links representation learning error to test performance and establishes asymptotic validity and power guarantees. Preliminary experiments suggest that this approach offers a practical and statistically grounded path toward scalable CI testing, bridging kernel-based theory with modern representation learning.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (4 more...)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Mali (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Mali (0.04)
Optimal structure learning and conditional independence testing
Gao, Ming, Wang, Yuhao, Aragam, Bryon
We establish a fundamental connection between optimal structure learning and optimal conditional independence testing by showing that the minimax optimal rate for structure learning problems is determined by the minimax rate for conditional independence testing in these problems. This is accomplished by establishing a general reduction between these two problems in the case of poly-forests, and demonstrated by deriving optimal rates for several examples, including Bernoulli, Gaussian and nonparametric models. Furthermore, we show that the optimal algorithm in these settings is a suitable modification of the PC algorithm. This theoretical finding provides a unified framework for analyzing the statistical complexity of structure learning through the lens of minimax testing.
- Europe > Ireland (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Conditional Independence Test Based on Transport Maps
He, Chenxuan, Gao, Yuan, Zhu, Liping, Huang, Jian
Testing conditional independence between two random vectors given a third is a fundamental and challenging problem in statistics, particularly in multivariate nonparametric settings due to the complexity of conditional structures. We propose a novel framework for testing conditional independence using transport maps. At the population level, we show that two well-defined transport maps can transform the conditional independence test into an unconditional independence test, this substantially simplifies the problem. These transport maps are estimated from data using conditional continuous normalizing flow models. Within this framework, we derive a test statistic and prove its consistency under both the null and alternative hypotheses. A permutation-based procedure is employed to evaluate the significance of the test. We validate the proposed method through extensive simulations and real-data analysis. Our numerical studies demonstrate the practical effectiveness of the proposed method for conditional independence testing.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Asia > China > Hong Kong (0.04)
- (3 more...)
Amortized Conditional Independence Testing
Duong, Bao, Hoang, Nu, Nguyen, Thin
Testing for the conditional independence structure in data is a fundamental and critical task in statistics and machine learning, which finds natural applications in causal discovery-a highly relevant problem to many scientific disciplines. Existing methods seek to design explicit test statistics that quantify the degree of conditional dependence, which is highly challenging yet cannot capture nor utilize prior knowledge in a data-driven manner. In this study, an entirely new approach is introduced, where we instead propose to amortize conditional independence testing and devise ACID ( Amortized C onditional In D ependence test)- a novel transformer-based neural network architecture that learns to test for conditional independence . ACID can be trained on synthetic data in a supervised learning fashion, and the learned model can then be applied to any dataset of similar natures or adapted to new domains by fine-tuning with a negligible computational cost. Our extensive empirical evaluations on both synthetic and real data reveal that ACID consistently achieves state-of-the-art performance against existing baselines under multiple metrics, and is able to generalize robustly to unseen sample sizes, dimensionalities, as well as non-linearities with a remarkably low inference time.
- Oceania > Australia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Reviews: Conditional Independence Testing using Generative Adversarial Networks
Originality This paper presents a new way to use GANs in hypothesis testing. It was very interesting to use GANs to construct a null distribution that adapts to the dataset without strong assumptions. The proposed method can be used for feature selection and explainable neural networks. The quantitative experimental results are limited to synthetic data but comprehensive and expected behaviours of the GCIT are observed in synthetic experiments. The only experimental result with real data is shown but it is hard to tell which result is more accurate and powerful.
Reviews: Conditional Independence Testing using Generative Adversarial Networks
The paper proposes a novel and interesting GAN-based method for conditional independence test. While there is some concern on applying the black-box method to rigorous statistical tests, I still think the paper includes a new and significant idea to the important but difficult problem of conditional independence. I would like the authors to reflect the review comments in making the final version.
Conditional independence testing under misspecified inductive biases
Conditional independence (CI) testing is a fundamental and challenging task in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step; we refer to this class of tests as regression-based tests. Although these methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors of interest, their behavior is less understood when they fail due to misspecified inductive biases; in other words, when the employed models are not flexible enough or when the training algorithm does not induce the desired predictors. Then, we study the performance of regression-based CI tests under misspecified inductive biases. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors.
K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing
Conditional independence (CI) testing is a fundamental task in statistics and machine learning, but its effectiveness is hindered by the challenges posed by high-dimensional conditioning variables and limited data samples. This article introduces a novel testing approach to address these challenges and enhance control of the type I error while achieving high power under alternative hypotheses. The proposed approach incorporates a computationally efficient classifier-based conditional mutual information (CMI) estimator, capable of capturing intricate dependence structures among variables. To approximate a distribution encoding the null hypothesis, a k -nearest-neighbor local sampling strategy is employed. An important advantage of this approach is its ability to operate without assumptions about distribution forms or feature dependencies. Furthermore, it eliminates the need to derive asymptotic null distributions for the estimated CMI and avoids dataset splitting, making it particularly suitable for small datasets.